An Amalgam of R
A Primer and Reference for Biologists
2024-12-13
Preface
What this book is about
This book explores the ever expanding universe of R. Specifically, it considers:
The historical development of the R language, the R engine, and the installation of R (Ch 1)
The creation of R objects and their fundamental characteristics (Ch 2)
R data storage entities, and the import and export of user data files (Ch 3)
Data management approaches using base R (Ch 4) and the tidyverse (Ch 5)
R approaches to graphics, including base plotting methods (Ch 6) and the ggplot2 package (Ch 7)
R functions (Ch 8) including loops, and the creation of user-defined classes and generic methods
Interfacing other languages (e.g., C, Fortran, C++, Python) and software environments to and from R (Ch 9)
Building custom R packages (Ch 10)
R Interactive interfaces and web applications including approaches from the packages tcltk, plotly and shiny (Ch 11)
The fundamental ways that R interacts with your computer (Ch 12)
While this book covers a lot of ground, clearly many other topics could be considered. Subjects explored are those I have found to be particularly useful or interesting during my 20+ years of using R as a biologist and statistician. Chapters concerning advanced topics (i.e., Chs 8-12) are intended to be starting points for further exploration, and the reader is directed to additional resources when necessary.
My view is that R is an important computer language. While ignored in many phenologies of computer languages (e.g., Boutin et al. 2002), R has had a large, devoted following for decades and its computational engine and language can be clearly linked to seminal concepts and advances in computer science. Further, from its inception R has been a tool for metaprogramming wherein code is shared and modified programmatically. For instance R has a wide variety of widely used APIs for languages like C, Fortran, C++, Java, Python, and many other others.
Individuals from the natural sciences, particularly biologists, are likely to find this book more useful than individuals from other backgrounds because coding examples and applications are generally biological. Non-biologists may find, however, that examples readily extend to other settings.
What this book is not about
Notably, although statistics is the primary focus/purpose of R, the primary emphasis of this book is not statistics. Instead I focus on the R language, and the characteristics, capabilities, and extensions of the R system. I take this approach because: 1) coverage of non-statistical topics is challenging in and of itself, and 2) the responsible introduction of statistical algorithms from any program or language (including R) should be accompanied by detailed information concerning the statistical procedures. Many pedagogic resources exist for the statistical application of R. These include: Aho (2014) (the pedagogic statistical companion to this book), Venables and Ripley (2002), (Faraway 2004, 2016), Crawley (2012), and Fox and Weisberg (2019), among others. It should be noted that while this text does not focus on inferential statistical methods, it does emphasize methods for handling, summarizing and displaying empirical data, and these steps may serve as a prerequisite for formal inferential analyses.
Distinguishing Characteristics of This Book
Many other sources have emphasized fundamental programming aspects of R, while largely ignoring statistics, including seminal texts (e.g., Chambers 2008, 2020; Wickham 2016, 2021), and definitive CRAN manuals (R Core Team 2024a, 2024b, 2024c), or have focused on particular, potentially non-statistical R attributes, including graphics (Wickham 2016; Murrell 2019) and web-based applications (Wickham 2021; Sievert 2020). This book is a brave/foolish? attempt to amalgamize and distill this disparate information, while occasionally emphasizing topics earlier works have ignored. For instance, Wickham (2019) admirably emphasizes many foundational and advanced programming ideas in R, but does not thoroughly consider some important programming extensions, including powerful syntheses with Python and Tcl. Unlike many other texts, this book also adheres to the format of a textbook, with numerous worked (often biological) examples, and exercises at the end each chapter.
Conventions
This document has been created with Windows users of R in mind. In the vast majority of cases, however, instructions and examples will be extendable to other operating systems. In cases when this is not true I note steps to address these inconsistencies.
Several conventions are followed throughout the text. R package names and important terms are italicized. R function names, function arguments and objects are written in blocked Courier font
. Functions and operations are often written into “chunks” whose contents are readily copied to a clipboard using an icon located at the top right of the chunk (HTML version of text only). For example:
The output from an evaluated chunk is generally printed immediately below. For example:
[1] "hello world"